Order-Free RNN with Visual Attention for Multi-Label Classification

نویسندگان

Shang-Fu Chen

Yi-Chen Chen

Chih-Kuan Yeh

Yu-Chiang Frank Wang

چکیده

We propose a recurrent neural network (RNN) based model for image multi-label classification. Our model uniquely integrates and learning of visual attention and Long Short Term Memory (LSTM) layers, which jointly learns the labels of interest and their co-occurrences, while the associated image regions are visually attended. Different from existing approaches utilize either model in their network architectures, training of our model does not require pre-defined label orders. Moreover, a robust inference process is introduced so that prediction errors would not propagate and thus affect the performance. Our experiments on NUS-WISE and MS-COCO datasets confirm the design of our network and its effectiveness in solving multi-label classification problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used in NLP tasks to capture the longterm and local dependencies respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the atte...

متن کامل

Saliency-based Sequential Image Attention with Multiset Prediction

Humans process visual scenes selectively and sequentially using attention. Central to models of human visual attention is the saliency map. We propose a hierarchical visual architecture that operates on a saliency map and uses a novel attention mechanism to sequentially focus on salient regions and take additional glimpses within those regions. The architecture is motivated by human visual atte...

متن کامل

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Large-scale image annotation is a challenging task in image content analysis, which aims to annotate each image of a very large dataset with multiple class labels. In this paper, we focus on two main issues in large-scale image annotation: 1) how to learn stronger features for multifarious images; 2) how to annotate an image with an automatically-determined number of class labels. To address th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1707.05495 شماره

صفحات -

تاریخ انتشار 2017

Order-Free RNN with Visual Attention for Multi-Label Classification

نویسندگان

چکیده

منابع مشابه

Exploiting Associations between Class Labels in Multi-label Classification

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding

Saliency-based Sequential Image Attention with Multiset Prediction

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

عنوان ژورنال:

اشتراک گذاری